Pash 2.0: Scaleable Sequence Anchoring for Next-Generation Sequencing Technologgies
نویسندگان
چکیده
Many applications of next-generation sequencing technologies involve anchoring of a sequence fragment or a tag onto a corresponding position on a reference genome assembly. Positional Hashing method, implemented in the Pash 2.0 program, is specifically designed for the task of high-volume anchoring. In this article we present multi-diagonal gapped kmer collation and other improvements introduced in Pash 2.0 that further improve accuracy and speed of Positional Hashing. The goal of this article is to show that gapped kmer matching with cross-diagonal collation suffices for anchoring across close evolutionary distances and for the purpose of human resequencing. We propose a benchmark for evaluating the performance of anchoring programs that captures key parameters in specific applications, including duplicative structure of genomes of humans and other species. We demonstrate speedups of up to tenfold in large-scale anchoring experiments achieved by PASH 2.0 when compared to BLAT, another similarity search program frequently used for anchoring.
منابع مشابه
Pash: efficient genome-scale sequence anchoring by Positional Hashing.
Pash is a computer program for efficient, parallel, all-against-all comparison of very long DNA sequences. Pash implements Positional Hashing, a novel parallelizable method for sequence comparison based on k-mer representation of sequences. The Positional Hashing method breaks the comparison problem in a unique way that avoids the quadratic penalty encountered with other sensitive methods and c...
متن کاملComparison and quantitative verification of mapping algorithms for whole-genome bisulfite sequencing
Coupling bisulfite conversion with next-generation sequencing (Bisulfite-seq) enables genome-wide measurement of DNA methylation, but poses unique challenges for mapping. However, despite a proliferation of Bisulfite-seq mapping tools, no systematic comparison of their genomic coverage and quantitative accuracy has been reported. We sequenced bisulfite-converted DNA from two tissues from each o...
متن کاملStrategies and Clinical Applications of Next Generation Sequencing
Abstract DNA sequencing is one of the great valuable techniques in molecular biology, which can be used to detect the sequence of nucleotides in a DNA fragment. The high-throughput sequencing known as Next Generation Sequencing (NGS) revolutionized genomic research and molecular biology; therefore, the whole human genome can be sequenced with a low cost in several days. NGS technology is simi...
متن کاملStrategies and Clinical Applications of Next Generation Sequencing
Abstract DNA sequencing is one of the great valuable techniques in molecular biology, which can be used to detect the sequence of nucleotides in a DNA fragment. The high-throughput sequencing known as Next Generation Sequencing (NGS) revolutionized genomic research and molecular biology; therefore, the whole human genome can be sequenced with a low cost in several days. NGS technology is simi...
متن کاملUsing GBrowse 2.0 to visualize and share next-generation sequence data
GBrowse is a mature web-based genome browser that is suitable for deployment on both public and private web sites. It supports most of genome browser features, including qualitative and quantitative (wiggle) tracks, track uploading, track sharing, interactive track configuration, semantic zooming and limited smooth track panning. As of version 2.0, GBrowse supports next-generation sequencing (N...
متن کامل